A New HDFS Structure Model to Evaluate the Performance of Word Count Application on Different File Size

نویسندگان

  • Mohammad Badrul Alam Miah
  • Mehedi Hasan
  • Md. Kamal Uddin
  • Chuck Lam
  • Michael G. Noll
  • N. Mirajkar
  • S. Bhujbal
چکیده

MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalization of Dynamic Two Stage Models in DEA: An Application in Saderat Bank

Dynamic network data envelopment analysis (DNDEA) has attracted a lot of attention in recent years. On one hand the available models in DNDEA evaluating the performance of a DMU with interrelated processes during specified multiple periods but on the other hand they can only measure the efficiency of dynamic network structure when a supply chain structure present. For example, in the banking in...

متن کامل

Development of a Non-Radial Network Model to Evaluate the Performance of a Multi-Stage Sustainable Supply Chain

 Abstract:The purpose of this paper is to present a new model of non-radial data envelopment analysis that is able to evaluate the systems one of these complete networks is supply chain of cement industry. In this paper, using a non-radial model in data envelopment analysis, a model with a network structure that can assess the sustainable supply chain of strategic industries is evaluated....

متن کامل

Live Website Traffic Analysis Integrated with Improved Performance for Small Files using Hadoop

Hadoop, an open source java framework deals with big data. It has HDFS (Hadoop distributed file system) and MapReduce. HDFS is designed to handle large amount files through clusters and suffers performance penalty while dealing with large number of small files. These large numbers of small files pose a heavy burden on the NameNode of HDFS and an increase execution time for MapReduce. Secondly, ...

متن کامل

Data - intensive file systems for Internet services : A rose by any other

Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classif...

متن کامل

Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)

Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015